Set characterization-selection towards classification based on interaction index

نویسندگان

  • Javier Murillo
  • Serge Guillaume
  • Flavio E. Spetale
  • Elizabeth Tapia
  • Pilar Bulacio
چکیده

In many real world datasets both the individual and coordinated action of features may be relevant for class identification. In this paper, a computational strategy for relevant feature selection based on the characterization of redundant or complementary features is proposed. The characterization is achieved using fuzzy measures and an interaction index computed from fuzzy measure coefficients. Fuzzy measure identification requires raw data to be turned into confidence degrees. This key step is carried out considering the distributions of feature values across all the classes. Fuzzy measure coefficients are then estimated with an improved version of the Heuristic Least Mean Squares algorithm that includes an efficient management of untouched coefficients. Then, a generalization of the Shapley index for an arbitrary number of features is used. Simulations experiments on synthetic datasets are performed to study the behavior of this generalized interaction index. For extreme datasets, containing either redundant or complementary features as well as noise, the index value is defined by mathematical formula. This result is used to motivate feature selection guidelines that take into account feature interactions. Experimental results on benchmark datasets show that the proposal allows for the design of compact, interpretable and competitive classification models.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Classification of Right/Left Hand Motor Imagery by Effective Connectivity Based on Transfer Entropy in EEG Signal

The right and left hand Motor Imagery (MI) analysis based on the electroencephalogram (EEG) signal can directly link the central nervous system to a computer or a device. This study aims to identify a set of robust and nonlinear effective brain connectivity features quantified by transfer entropy (TE) to characterize the relationship between brain regions from EEG signals and create a hierarchi...

متن کامل

Improving of Feature Selection in Speech Emotion Recognition Based-on Hybrid Evolutionary Algorithms

One of the important issues in speech emotion recognizing is selecting of appropriate feature sets in order to improve the detection rate and classification accuracy. In last studies researchers tried to select the appropriate features for classification by using the selecting and reducing the space of features methods, such as the Fisher and PCA. In this research, a hybrid evolutionary algorit...

متن کامل

Lexical Semantics and Selection of TAM in Bantu Languages: A Case of Semantic Classification of Kiswahili Verbs

The existing literature on Bantu verbal semantics demonstrated that inherent semantic content of verbs pairs directly with the selection of tense, aspect and modality formatives in Bantu languages like Chasu, Lucazi, Lusamia, and Shiyeyi. Thus, the gist of this paper is the articulation of semantic classification of verbs in Kiswahili based on the selection of TAM types. This is because the sem...

متن کامل

Negative Selection Based Data Classification with Flexible Boundaries

One of the most important artificial immune algorithms is negative selection algorithm, which is an anomaly detection and pattern recognition technique; however, recent research has shown the successful application of this algorithm in data classification. Most of the negative selection methods consider deterministic boundaries to distinguish between self and non-self-spaces. In this paper, two...

متن کامل

A Random Forest Classifier based on Genetic Algorithm for Cardiovascular Diseases Diagnosis (RESEARCH NOTE)

Machine learning-based classification techniques provide support for the decision making process in the field of healthcare, especially in disease diagnosis, prognosis and screening. Healthcare datasets are voluminous in nature and their high dimensionality problem comprises in terms of slower learning rate and higher computational cost. Feature selection is expected to deal with the high dimen...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Fuzzy Sets and Systems

دوره 270  شماره 

صفحات  -

تاریخ انتشار 2015